# BAT-Chain: Bayesian-Aware Transport Chain for Topic Hierarchies Discovery

This is the code for our submitted paper BAT-Chain: Bayesian-Aware Transport Chain for Topic Hierarchies Discovery at ICLR2023.

## Getting Started
### Install
- Install pytorch with cuda and other requirements as you need.
```bash
pip install torch torchvision torchaudio
```
### Dataset
- Datasets in our paper
We have provided the 20ng in the `dataset`  folder.
- Customising your own dataset

Organizing the Bow and the vocabulary of the corpus into the form WeTe expects according to the provided `.pkl` file in `dataset` folder and the `dataloader.py` file, and happy to try CLT !

### Pretrained word embeddings
We recommend loading the pre-trained word embeddings for better results. 
- Glove

the pretrained glove word embeddings can be downloaded from [Glove](https://cdn-lfs.huggingface.co/stanfordnlp/glove/6471382cdd837544bf3ac72497a38715e845897d265b2b424b4761832009c837).
- Or, training (finetuning) the word embeddings for the corpus with word2vec tool.

### Training
- Easy to train:
```bash
python main.py
```
Changing the arguments in `main.py` for different datasets and settings. The learned topics are saved in `runs` folder.
